Bulgarian-English Parallel Treebank: Word and Semantic Level Alignment

نویسندگان

Kiril Simov

Petya Osenova

Aleksandar Savkov

Stanislava Kancheva

چکیده

The paper describes the basic strategies behind the word and semantic level alignment in the Bulgarian-English treebank. The word level alignment has taken into consideration the experience within other NLP groups in the context of the Bulgarian language specific features. The semantic level alignment builds on the word level alignment and is represented in the framework of the Minimal Recursion Se-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Issues in Language Technology – LiLT

The paper describes the construction of a Bulgarian-English treebank aligned on the word and semantic level. We consider the manual word level alignment easier and more reliable than the manual alignment on syntactic and semantic level. Thus, after manual word level alignment we apply an automatic procedure for the construction of semantic level alignments. Our work presents the main steps of t...

متن کامل

Creating Arabic-English Parallel Word-Aligned Treebank Corpora at LDC

This contribution describes an Arabic-English parallel word aligned treebank corpus from the Linguistic Data Consortium that is currently under production. Herein we primarily focus on efforts required to assemble the package and instructions for using it. It was crucial that word alignment be performed on tokens produced during treebanking to ensure cohesion and greater utility of the corpus. ...

متن کامل

Language engineering for syntactic knowledge transfer

In this paper we present a method for an English-Romanian treebank construction, together with the obtained evaluation results. The treebank is built upon a parallel English-Romanian corpus word-aligned and annotated at the morphological and syntactic level. The syntactic trees of the Romanian texts are generated by considering the syntactic phrases of the English parallel texts automatically r...

متن کامل

Ontology-Supported Text Classification Based on Cross-Lingual Word Sense Disambiguation

The paper reports on recent experiments in cross-lingual document processing (with a case study for Bulgarian-English-Romanian language pairs) and brings evidence on the benefits of using linguistic ontologies for achieving, with a high level of accuracy, difficult tasks in NLP such as word alignment, word sense disambiguation, document classification, cross-language information retrieval, etc....

متن کامل

A Model for Fine-Grained Alignment of Multilingual Texts

While alignment of texts on the sentential level is often seen as being too coarse, and word alignment as being too fine-grained, bior multilingual texts which are aligned on a level inbetween are a useful resource for many purposes. Starting from a number of examples of non-literal translations, which tend to make alignment difficult, we describe an alignment model which copes with these cases...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Bulgarian-English Parallel Treebank: Word and Semantic Level Alignment

نویسندگان

چکیده

منابع مشابه

Linguistic Issues in Language Technology – LiLT

Creating Arabic-English Parallel Word-Aligned Treebank Corpora at LDC

Language engineering for syntactic knowledge transfer

Ontology-Supported Text Classification Based on Cross-Lingual Word Sense Disambiguation

A Model for Fine-Grained Alignment of Multilingual Texts

عنوان ژورنال:

اشتراک گذاری